Variogram analysis of the spatial genetic structure of continuous populations using multilocus microsatellite data AUTHORS
نویسندگان
چکیده
A geostatistical perspective on spatial genetic structure may explain methodological issues of quantifying spatial genetic structure and suggest new approaches to address them. We use a variogram approach to (i) derive a spatial partitioning of molecular variance, gene diversity, and genotypic diversity for microsatellite data under the infinite allele model (IAM) and the stepwise mutation model (SMM), (ii) develop a weighting of sampling units to reflect ploidy levels or multiple sampling of genets, and (iii) show how variograms summarize the spatial genetic structure within a population under isolation-by-distance. The methods are illustrated with data from a population of the epiphytic lichen Lobaria pulmonaria, using six microsatellite markers. Variogram-based analysis not only avoids bias due to the underestimation of population variance in the presence of spatial autocorrelation, but it also provides estimates of population genetic diversity and the degree and extent of spatial genetic structure accounting for autocorrelation. Wagner et al.: Variogram analysis of genetic structure 4 INTRODUCTION Methods for the analysis of spatial genetic structure have mostly been developed for single-locus, diploid genotypic data such as provided by isozymes (SMOUSE and PEAKALL 1999). In contrast to this latter marker type, microsatellite data also contain information on repeat numbers of individual gene copies. Microsatellite markers are often highly variable, and differences in allele size are interpreted in the light of alternative evolutionary models. Under the infinite allele model (IAM), any mutation is assumed to lead to a new allele, whereas under the stepwise mutation model (SMM), mutation is likely to increase or decrease the number of repeats at a microsatellite locus by one (BALLOUX and GOUDET 2002). Neither of these two extreme mutation models seems to fit perfectly to microsatellite loci, so that measures based on IAM and SMM are often reported together (BALLOUX and LUGON-MOULIN 2002) The difference between statistical measures (see below) under the two models is assumed to indicate the relative importance of mutation and drift (HARDY 2003). Population genetic analyses are based on gene diversity under IAM (e.g., FST) and on molecular variance under SMM (e.g., RST). FST and RST quantify the differentiation of isolated populations assuming random mating within and restricted gene flow among populations. Both FST and RST can be adapted to pairwise comparisons, and Mantel tests are used to test the correlation with geographic distance between pairs of populations (HARDY and VEKEMANS 2002). However, limited gene movement can cause isolation-by-distance effects even within continuous populations. The resulting spatial genetic structure within a population can be summarized by kinship for IAM (LOISELLE et al. 1995) or relationship coefficients for SMM (STREIFF et al. 1998). Kinship and relationship coefficients assess the similarity of homologous alleles between individuals and may be expressed as a function of geographic distance. Statistical Wagner et al.: Variogram analysis of genetic structure 5 tests for isolation by distance within continuous populations often involve either a Mantel test of Moran’s I (or related correlation coefficients, e.g. SMOUSE and PEAKALL 1999) or join-count statistics (EPPERSON 2003). When assessing genetic diversity, it may be necessary to exclude comparisons of gene copies within individuals if they cannot be assumed to be independent. For organisms with variable ploidy levels within populations such as Taraxacum sp. (MEIRMANS et al. 2003; VAN DER HULST et al. 2003), individuals with a high ploidy level will receive more weight in the estimation of the population genetic diversity than do, e.g., diploid individuals unless ploidy level is accounted for. A similar problem arises for clonal organisms, where the multiple sampling of ramets from the same genetic individuum (genet), can bias any measure of genetic structure of a population (BALLOUX et al. 2003; HÄMMERLI and REUSCH 2003; PARKS and WERTH 1993). This is commonly taken into account by retaining a single sample per genet, either assuming the center of a clonal patch to be its origin or randomly selecting one sample per genet (CHUNG and EPPERSON 2000; HÄMMERLI and REUSCH 2003; REUSCH et al. 1999). Both approaches may, however, lead to a considerable loss of information and increased error in the description of the spatial genetic structure within populations. VEKEMANS and HARDY (2004) identified some important problems and common misuses of spatial analysis in population genetics. (i) The spatial genetic structure is often described in terms of a maximum distance to which such structure extends. The common practice of assessing the extent of spatial genetic structure by the distance at which a Moran’s I correlogram reaches zero (e.g., EPPERSON 2003) is misleading, as this estimate depends strongly on the sampling design (VEKEMANS and HARDY 2004). (ii) The presence of non-random spatial genetic structure can be tested using Mantel permutation tests for a series of distance classes, and Wagner et al.: Variogram analysis of genetic structure 6 a Bonferroni correction is applied to account for multiple tests. VEKEMANS and HARDY (2004) caution that while the uncorrected test is too liberal, the correction makes it too conservative, and argue that this approach should not be used to determine the scales of spatial genetic structure, as the null hypothesis is only the overall absence of spatial genetic structure. (iii) The amount of spatial genetic structure should not be assessed from the value (e.g. of Moran’s I) for the first distance class, as this absolute value depends strongly on the sampling design (FENSTER et al. 2003; VEKEMANS and HARDY 2004). (iv) Estimating biological parameters, such as dispersal distances, is only valid if the observed spatial genetic structure represents a true isolation-bydistance pattern at dispersal-drift equilibrium (VEKEMANS and HARDY 2004), thus assuming that the patterning results only from limited dispersal, that it has reached a stationary phase, and that the scale of the study is appropriate (VEKEMANS and HARDY 2004). Moran’s I, Mantel tests and join-count statistics were borrowed from the general field of spatial statistics, originally developed, e.g., in geography, and adapted to population genetic data and questions as necessary. Other measures of spatial genetic structure, such as kinship or relationship coefficients, were developed specifically for genetic data and are little integrated with spatial statistical theory. However, many of the above problems are of general nature and not specific to population genetics. Specifically, variogram modeling as developed in geostatistics may provide explanations and alternatives for the problems raised by (VEKEMANS and HARDY 2004). The term variogram refers to a plot of the semivariance (see below) against distance. The well-known Geary’s c correlogram is actually a standardized variogram (LEGENDRE and LEGENDRE 1998). Several population genetic measures and methods rely on the semivariance, namely the genetic distance measure by GOLDSTEIN et al. (1999) and the RST statistic (SLATKIN 1995). Nonetheless, variogram modeling is rare in population genetics. Wagner et al.: Variogram analysis of genetic structure 7 PIAZZA and MENOZZI (1983) proposed a variogram of differences in allele frequencies between populations, and MONESTIEZ and GOULARD (1997) provided an application of multivariate geostatistical analysis to genetic data, but neither approach found much resonance in the population genetic literature. WAGNER (2003, 2004) developed a formal integration of multivariate analysis and geostatistics in the context of plant community ecology. The crucial point of such an integration of spatial and non-spatial analysis is that the semivariance partitions the estimate of the population variance by distance class (WAGNER 2003). Hence, the semivariance can be used to partition the results of non-spatial analyses, such as population estimates of genetic diversity, by distance (multiscale ordination), and variograms can be interpreted in an ecologically more meaningful way. This paper extends the spatial partitioning of variance to population genetic data and problems. The first section introduces key geostatistical concepts and methods and discusses the sensitivity of commonly used measures of autocorrelation and population variance. The methods section pursues three specific objectives: (i) to derive a spatial partitioning of measures of genetic diversity compatible with IAM and SMM, (ii) to develop a method for weighting sampling units to reflect different ploidy levels or multiple sampling of ramets within genets without data reduction, and (iii) to show how variogram modeling can be used for estimating population genetic parameters and summarizing the spatial genetic structure within populations. The methods are illustrated with a worked example (Appendix) and with an application to empirical microsatellite data from a population of the haploid, tree-colonizing (epiphytic) lichen Lobaria pulmonaria. We conclude with considerations for the robust estimation of the spatial genetic structure of continuous populations. Wagner et al.: Variogram analysis of genetic structure 8 A GEOSTATISTICAL PERSPECTIVE Geostatistical concepts and methods Spatial autocorrelation and stationarity. Spatial autocorrelation refers to the common phenomenon that nearby observations tend to be more similar than distant ones. Positive spatial autocorrelation is assumed to result from any kind of spatial process, such as pollen flow or seed dispersal in plants. The observed spatial autocorrelation can be quantified for various purposes (FORTIN et al. 2001), such as: (i) testing for the presence of autocorrelation, e.g., in order to meet assumptions for estimating population characteristics, (ii) assessing the range of autocorrelation, i.e., the distance beyond which observations are spatially independent, (iii) fitting a theoretical model in order to summarize the observed spatial structure, (iv) inference about the underlying spatial process, such as dispersal distances and differences among populations. However, geostatistical analysis requires some assumption of stationarity, i.e., the structure of spatial autocorrelation must be the same throughout the study area. Specifically, it is common to assume weak stationarity, where the mean and the variance are constant and the autocorrelation only depends on the geographic distance between sampling units (BURROUGH 1995). Correlograms and the empirical variogram. Geostatistics considers four statistical moments of a random variable: (i) its mean, (ii) variance, (iii) covariance, and (iv) semivariance (BURROUGH 1995). Spatial autocorrelation can be quantified based on covariance (Moran’s I) or semivariance (empirical variogram and Geary’s c correlogram). Correlograms are standardized through division by the sample variance (Moran’s I) or population variance (Geary’s c; CLIFF and ORD 1981): Wagner et al.: Variogram analysis of genetic structure 9 Moran’s I: ( ) ( ) ( )( ) ( ) ∑ ∑
منابع مشابه
Variogram analysis of the spatial genetic structure of continuous populations using multilocus microsatellite data.
A geostatistical perspective on spatial genetic structure may explain methodological issues of quantifying spatial genetic structure and suggest new approaches to addressing them. We use a variogram approach to (i) derive a spatial partitioning of molecular variance, gene diversity, and genotypic diversity for microsatellite data under the infinite allele model (IAM) and the stepwise mutation m...
متن کاملGenetic analysis of pike-perch, Sander lucioperca L., populations revealed by microsatellite DNA markers in Iran
This study was conducted in order to investigate genetic diversity and population structure of pike-perch in the Northern part of Iran. For this purpose, 207 adult pike-perches from four regions of the Caspian Sea watershed (Talesh Coasts, Anzali Wetland, Chaboksar Coasts and Aras Dam) were collected. DNA was extracted and by using 15 pairs of microsatellite primers, Polymerase Chain Reaction (...
متن کاملGenetic Heterogeneity among Leishmania major Isolates in Iran Determined by Restriction Fragment Length Polymorphism (RFLP) and Multilocus Microsatellite Typing (MLMT)
Background & Aims: In recent years, molecular methods for characterizing genetic heterogeneity have found a major place in modern approaches. In this study, two different molecular techniques including Restriction Fragment Length Polymorphism (RFLP) and Multi Locus microsatellite typing (MLMT) were carried out in order to evaluate genetic heterogeneity among isolates of Leishmania major in Iran...
متن کاملPopulation structure of Rutilus frisii kutum in Iranian Coastline of the Caspian Sea using microsatellite markers
Kutum is considered as one of the anadramous species of the Caspian Sea. Due to continuous population decline of the fish since 1975, Iranian fisheries organization started to restock this species. Nowadays, there is growing concern over the effects of restocking on natural populations. For this purpose, population structure and genetic variation of this species in the Iranian coastline of Casp...
متن کاملGenetic Population Structure of Hawksbill Turtle (Eretmochelys imbricta) Using Microsatellite Analysis
Information on the genetic structure of marine species is essential for stock improvement programs. In orderto analyses the genetic diversity of the Hawksbill turtle (Eretmochelys imbricte) by the microsatellite geneticmethod, 64 samples were caught from the beaches located in Kish and Qeshm islands. Polymerase chainreactions (PCR) of genomic DNA extracted from the samples wer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004